In order to forecast a country’s vaccination rate based on past vaccination distribution data, we use time series analysis to model trend behavior — such a model helps us predict when a country can meet the WHO’s target of 70% vaccination by June 30th, 2022. The given data is the weekly amount of vaccine doses administered by each country, so we conducted an univariate analysis for fitting a model. We decided on the Exponential Smoothing model proposed by Holt and Winters, which, among many advantages, allows us to assign more weight to the more recent data points. As time goes, the model will be fitted increasingly accurately, with a decreasing error margin.
There are a number of factors we took into consideration before deciding on a model. From the preliminary analysis of the data, we found that:
The ARIMA model takes into account the data’s changing mean, standard deviation, and seasonality. An ARIMA model is particularly useful when we want to model fluctuation about the mean and make the model stationary through differencing. However, since there is no visible seasonality but there is a clear upward linear trend in the mean, we used parametric fitting of the data with a focus on the more recent observations.
Exponential smoothing is the procedure of continuously revising a prediction after taking into account the more recent observations. In practice, this is achieved by exponentially diminishing the older observations’ importance for forecasting by decreasing their weights. Both the Holt-Winters and ARIMA model generate similar results and error margins.
Projections for all countries with available data are shown below.
A few examples of projection results for selected countries are shown below. Note that the upper and lower bounds grow large due to the low amount of data points and the length of the projection. These projections use an alpha value of 0.995 and beta value of 0.5, so as to take into account the average vaccination rate (slope) throughout all the available data points.
Our biggest challenge is the lack of data. Most countries have fewer than 52 observations (Guatemala, which was our initial study case, has 43 observations). From this, we needed to split the dataset into training and testing sets, further limiting our source. At the time of writing (January 2022), we want to forecast more than 20 weeks into the future (to reach June 30, 2022), so it is inevitable to receive a large margin of error. However, with the nature of our model fitting, the margin of error will only decrease with more data input. We are also still analyzing weekly data to see if using such data can give us a better model than when using cumulative data, especially in seeing seasonality trends previously hidden. We are in the process of obtaining the vaccine supply data, which would give us more information to fit a multivariate model. One possibility is doing a neural network approach: Neural networks can learn noisy and nonlinear relationships and can output multivariate and multi-step forecasting.